Population validity for educational data mining models: A case study in affect detection

نویسندگان

  • Jaclyn Ocumpaugh
  • Ryan Shaun Joazeiro de Baker
  • Sujith M. Gowda
  • Neil T. Heffernan
  • Cristina Heffernan
چکیده

ICT-enhanced research methods such as educational data mining (EDM) have allowed researchers to effectively model a broad range of constructs pertaining to the student, moving from traditional assessments of knowledge to assessment of engagement, meta-cognition, strategy, and affect. The automated detection of these constructs allows EDM researchers to develop intervention strategies that can be implemented either by the software or the teacher. It also allows for secondary analyses of the construct, where the detectors are applied to a data set that is much larger than one that could be analyzed by more traditional methods. However, in many cases, the data used to develop EDM models is collected from students who may not be representative of the broader populations who are likely to use ICT. In order to use EDM models (automated detectors) with new populations, their generalizability must be verified. In this study, we examine whether detectors of affect remain valid when applied to new populations. Models of four educationally relevant affective states were constructed based on data from urban, suburban, and rural students using ASSISTments software for middle school mathematics in the Northeastern United States. We find that affect detectors trained on a population drawn primarily from one demographic grouping do not generalize to populations drawn primarily from the other demographic groupings, even though those populations might be considered part of the same national or regional culture. Models constructed using data from all three sub-populations are more applicable to students in those populations than those trained on a single group, but still do not achieve ideal population validity—the ability to generalize across all sub-groups. In particular, models generalize better across urban and suburban students than rural students. These findings have important implications for data collection efforts, validation techniques, and the design of interventions that are intended to be applied at scale.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)

Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...

متن کامل

Evaluation of Data Mining Algorithms for Detection of Liver Disease

Background and Aim: The liver, as one of the largest internal organs in the body, is responsible for many vital functions including purifying and purifying blood, regulating the body's hormones, preserving glucose, and the body. Therefore, disruptions in the functioning of these problems will sometimes be irreparable. Early prediction of these diseases will help their early and effective treatm...

متن کامل

Extending Log-Based Affect Detection to a Multi-User Virtual Environment for Science

The application of educational data mining (EDM) techniques to interactive learning software is increasingly being used to broaden the range of constructs typically incorporated in student models, moving from traditional assessment of student knowledge to the assessment of engagement, affect, strategy, and metacognition. Researchers are also broadening the range of environments within which the...

متن کامل

Comparison of Three Decision-Making Models in Differentiating Five Types of Heart Disease: A Case Study in Ghaem Sub-Specialty Hospital

Introduction: cardiovascular diseases are becoming the main cause of mortality and morbidity in most countries. This research goal was to predict the types of heart diseases for more accurate diagnosis by data mining and neural network technics. Method: This research was an applied-survey study and after data preprocessing, three approaches of neural network, decision making tree and Bayes simp...

متن کامل

Comparison of Three Decision-Making Models in Differentiating Five Types of Heart Disease: A Case Study in Ghaem Sub-Specialty Hospital

Introduction: cardiovascular diseases are becoming the main cause of mortality and morbidity in most countries. This research goal was to predict the types of heart diseases for more accurate diagnosis by data mining and neural network technics. Method: This research was an applied-survey study and after data preprocessing, three approaches of neural network, decision making tree and Bayes simp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • BJET

دوره 45  شماره 

صفحات  -

تاریخ انتشار 2014